We decided to conduct a detailed analysis of Larceny and Theft incidents, as they are the most common crimes, accounting for 22.5% of all reported crimes. Additionally, the most frequent resolution for these crimes is through arrests. To analyze these incidents, we filtered out the ones with a resolution of "NONE" and focused on those with descriptions such as 'ARREST, BOOKED' or 'ARREST, CITED'. We also created a breakdown by PdDistrict to determine which districts are performing better in addressing these crimes.
Upon analyzing the data, we noticed that the proportions of Larceny and Theft incidents have remained relatively stable over time. This leads us to believe that the police department's efforts to address these crimes have not significantly impacted the overall incidence of these crimes.
# Filter the dataframe to only include incidents in the 'Larceny/Theft' category with resolution of 'NONE'
larceny_theft_arrest_df = df_crimes[(df_crimes['Category'] == 'LARCENY/THEFT') & (df_crimes['Resolution'] != 'NONE') & (df_crimes['Descript'].str.contains('ARREST'))]
# Convert the 'Date' column to a datetime data type
larceny_theft_none_df['Date'] = pd.to_datetime(larceny_theft_none_df['Date'], format='%m/%d/%Y')
# Create a new column called 'Year' by extracting the year from the 'Date' column
larceny_theft_none_df['Year'] = larceny_theft_none_df['Date'].dt.year
# Create a new column called 'PdDistrict'
larceny_theft_none_df['PdDistrict'] = larceny_theft_none_df['PdDistrict'].fillna('Unknown')
# Group the data by year and PdDistrict, and count the number of incidents in each group
yearly_pd_counts = larceny_theft_none_df.groupby(['Year', 'PdDistrict']).size().reset_index(name='TotalIncidents')
# Pivot the data to create a matrix with years as rows, PdDistricts as columns, and incident counts as values
yearly_pd_pivot = yearly_pd_counts.pivot(index='Year', columns='PdDistrict', values='TotalIncidents').fillna(0)
# Create a percentage version of the pivot table
yearly_pd_pct = yearly_pd_pivot.apply(lambda x: x/x.sum()*100, axis=1)
# Plot a yearly stacked bar chart with a PdDistrict breakdown
yearly_pd_pct.plot(kind='bar', stacked=True, figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Percentage of incidents')
plt.title('Percentage of Larceny/Theft incidents that involved an arrest by PdDistrict and year')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/1870507638.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['Date'] = pd.to_datetime(larceny_theft_none_df['Date'], format='%m/%d/%Y')
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/1870507638.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['Year'] = larceny_theft_none_df['Date'].dt.year
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/1870507638.py:11: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['PdDistrict'] = larceny_theft_none_df['PdDistrict'].fillna('Unknown')
In conclusion, by utilizing the map, and analyzing the concentration of crime in different police districts, we have obtained valuable insights that can help identify patterns and trends in the data. This can aid in making recommendations to improve public safety in San Francisco and enable the Police to gain a better understanding of crime trends in the city.
# Display the map
SF_map
We can see the LARCENY/THEFT are the most prevalent crimes corresponding to 22.5 %
# Group the data by category and count the number of incidents in each group
category_counts = df_crimes.groupby('Category').size().reset_index(name='TotalIncidents')
# Calculate the total number of incidents in the dataset
total_incidents = category_counts['TotalIncidents'].sum()
# Calculate the percentage of incidents for each category
category_counts['Percentage'] = category_counts['TotalIncidents'] / total_incidents * 100
# Sort the data by percentage in descending order
category_counts = category_counts.sort_values('Percentage', ascending=False)
# Print the results
category_counts[['Category', 'Percentage']].head(5)
| Category | Percentage | |
|---|---|---|
| 15 | LARCENY/THEFT | 22.445146 |
| 20 | OTHER OFFENSES | 14.175649 |
| 19 | NON-CRIMINAL | 11.125861 |
| 1 | ASSAULT | 7.844097 |
| 34 | VEHICLE THEFT | 5.927519 |
We can see the most common crime description is Theft from locked auto.
# Group the data by description and count the number of incidents in each group
description_counts = df_crimes.groupby('Descript').size().reset_index(name='TotalIncidents')
# Calculate the total number of incidents in the dataset
total_incidents = description_counts['TotalIncidents'].sum()
# Calculate the percentage of incidents for each description
description_counts['Percentage'] = description_counts['TotalIncidents'] / total_incidents * 100
# Sort the data by percentage in descending order
description_counts = description_counts.sort_values('Percentage', ascending=False)
# Print the results
description_counts[['Descript', 'Percentage']].head(5)
| Descript | Percentage | |
|---|---|---|
| 388 | GRAND THEFT FROM LOCKED AUTO | 8.395112 |
| 455 | LOST PROPERTY | 3.660300 |
| 134 | BATTERY | 3.114356 |
| 707 | STOLEN AUTOMOBILE | 3.034292 |
| 271 | DRIVERS LICENSE, SUSPENDED OR REVOKED | 2.928916 |
This stacked bar chart displays the number of Larceny/Theft incidents in San Francisco over time, grouped by broad resolution categories. The categories include Arrest (combining Arrest, Booked and Arrest, Cited), Prosecuted (combining Prosecuted by Outside Agency and Prosecuted for Lesser Offense), and other categories. The chart highlights the trends in the grouped resolution categories of Larceny/Theft incidents in San Francisco over the past two decades.
# Filter the dataframe to only include incidents in the 'Larceny/Theft' category
larceny_theft_df = df_crimes[df_crimes['Category'] == 'LARCENY/THEFT']
# Convert the 'Date' column to a datetime data type
larceny_theft_df['Date'] = pd.to_datetime(larceny_theft_df['Date'], format='%m/%d/%Y')
# Create a new column called 'Year' by extracting the year from the 'Date' column
larceny_theft_df['Year'] = larceny_theft_df['Date'].dt.year
# Create a new column called 'Month' by extracting the month from the 'Date' column
larceny_theft_df['Month'] = larceny_theft_df['Date'].dt.month_name()
# Map sub-categories to broader categories
resolution_map = {
'ARREST, BOOKED': 'ARREST',
'ARREST, CITED': 'ARREST',
'PROSECUTED BY OUTSIDE AGENCY': 'PROSECUTED',
'PROSECUTED FOR LESSER OFFENSE': 'PROSECUTED'
}
larceny_theft_df['Resolution'] = larceny_theft_df['Resolution'].map(resolution_map).fillna(larceny_theft_df['Resolution'])
# Group the data by year and resolution, and count the number of incidents in each group
yearly_resolutions = larceny_theft_df.groupby(['Year', 'Resolution']).size().reset_index(name='TotalIncidents')
# Pivot the data to create a matrix with years as rows, resolutions as columns, and incident counts as values
yearly_resolutions_pivot = yearly_resolutions.pivot(index='Year', columns='Resolution', values='TotalIncidents').fillna(0)
# Plot a stacked bar chart showing the number of incidents for each resolution over time
yearly_resolutions_pivot.plot(kind='bar', stacked=True, figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Number of incidents')
plt.title('Larceny/Theft incidents by resolution')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/2417279133.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy larceny_theft_df['Date'] = pd.to_datetime(larceny_theft_df['Date'], format='%m/%d/%Y') /var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/2417279133.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy larceny_theft_df['Year'] = larceny_theft_df['Date'].dt.year /var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/2417279133.py:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy larceny_theft_df['Month'] = larceny_theft_df['Date'].dt.month_name() /var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/2417279133.py:20: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy larceny_theft_df['Resolution'] = larceny_theft_df['Resolution'].map(resolution_map).fillna(larceny_theft_df['Resolution'])
The majority of the cases have None as the Resolution
larceny_theft_df['Resolution'] = larceny_theft_df['Resolution'].map(resolution_map).fillna(larceny_theft_df['Resolution'])
# Group the data by resolution and count the number of incidents in each group
resolution_counts = larceny_theft_df.groupby('Resolution').size().reset_index(name='TotalIncidents')
# Calculate the total number of incidents in the "Larceny/Theft" category
total_incidents = resolution_counts['TotalIncidents'].sum()
# Calculate the percentage of incidents for each resolution
resolution_counts['Percentage'] = resolution_counts['TotalIncidents'] / total_incidents * 100
# Sort the data by percentage in descending order
resolution_counts = resolution_counts.sort_values('Percentage', ascending=False)
# Print the results
print(resolution_counts[['Resolution', 'Percentage']])
Resolution Percentage 5 NONE 91.621319 0 ARREST 7.447879 6 NOT PROSECUTED 0.334746 9 UNFOUNDED 0.255453 1 COMPLAINANT REFUSES TO PROSECUTE 0.098541 2 DISTRICT ATTORNEY REFUSES TO PROSECUTE 0.097704 3 EXCEPTIONAL CLEARANCE 0.092473 7 PROSECUTED 0.032847 8 PSYCHOPATHIC CASE 0.009833 4 LOCATED 0.009206
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/403120406.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy larceny_theft_df['Resolution'] = larceny_theft_df['Resolution'].map(resolution_map).fillna(larceny_theft_df['Resolution'])
# Group the data by sub-category and count the number of incidents in each group
subcategory_counts = larceny_theft_df.groupby('Descript').size().reset_index(name='TotalIncidents')
# Calculate the total number of incidents in the "Larceny/Theft" category
total_incidents = subcategory_counts['TotalIncidents'].sum()
# Calculate the percentage of incidents for each sub-category
subcategory_counts['Percentage'] = subcategory_counts['TotalIncidents'] / total_incidents * 100
# Sort the data by percentage in descending order
subcategory_counts = subcategory_counts.sort_values('Percentage', ascending=False)
# Print the results
subcategory_counts[['Descript', 'Percentage']]
| Descript | Percentage | |
|---|---|---|
| 19 | GRAND THEFT FROM LOCKED AUTO | 37.402793 |
| 36 | PETTY THEFT FROM LOCKED AUTO | 10.860819 |
| 39 | PETTY THEFT OF PROPERTY | 9.634186 |
| 23 | GRAND THEFT OF PROPERTY | 6.150949 |
| 35 | PETTY THEFT FROM A BUILDING | 5.360741 |
| ... | ... | ... |
| 58 | THEFT, DRUNK ROLL, ATT. | 0.000837 |
| 59 | THEFT, GRAND, AGRICULTURAL | 0.000837 |
| 29 | LOOTING DURING STATE OF EMERGENCY | 0.000837 |
| 12 | ATTEMPTED THEFT PHONE BOOTH | 0.000628 |
| 50 | THEFT, ANIMAL, ATT. | 0.000418 |
63 rows × 2 columns
Here we filter out the None Category so we can visualize better the data
larceny_theft_df['Resolution'] = larceny_theft_df['Resolution'].map(resolution_map).fillna(larceny_theft_df['Resolution'])
# Filter out the 'NONE' sub-category
larceny_theft_df = larceny_theft_df[larceny_theft_df['Resolution'] != 'NONE']
# Group the data by year and resolution, and count the number of incidents in each group
yearly_resolutions = larceny_theft_df.groupby(['Year', 'Resolution']).size().reset_index(name='TotalIncidents')
# Pivot the data to create a matrix with years as rows, resolutions as columns, and incident counts as values
yearly_resolutions_pivot = yearly_resolutions.pivot(index='Year', columns='Resolution', values='TotalIncidents').fillna(0)
# Plot a stacked bar chart showing the number of incidents for each resolution over time
yearly_resolutions_pivot.plot(kind='bar', stacked=True, figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Number of incidents')
plt.title('Larceny/Theft incidents by resolution')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/2167141799.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy larceny_theft_df['Resolution'] = larceny_theft_df['Resolution'].map(resolution_map).fillna(larceny_theft_df['Resolution'])
# Filter the dataframe to only include incidents in the 'Larceny/Theft' category with resolution of 'NONE'
larceny_theft_none_df = df_crimes[(df_crimes['Category'] == 'LARCENY/THEFT') & (df_crimes['Resolution'] == 'NONE')]
# Convert the 'Date' column to a datetime data type
larceny_theft_none_df['Date'] = pd.to_datetime(larceny_theft_none_df['Date'], format='%m/%d/%Y')
# Create a new column called 'Year' by extracting the year from the 'Date' column
larceny_theft_none_df['Year'] = larceny_theft_none_df['Date'].dt.year
# Create a new column called 'PdDistrict'
larceny_theft_none_df['PdDistrict'] = larceny_theft_none_df['PdDistrict'].fillna('Unknown')
# Group the data by year and PdDistrict, and count the number of incidents in each group
yearly_pd_counts = larceny_theft_none_df.groupby(['Year', 'PdDistrict']).size().reset_index(name='TotalIncidents')
# Pivot the data to create a matrix with years as rows, PdDistricts as columns, and incident counts as values
yearly_pd_pivot = yearly_pd_counts.pivot(index='Year', columns='PdDistrict', values='TotalIncidents').fillna(0)
# Plot a yearly bar chart with a PdDistrict breakdown
yearly_pd_pivot.plot(kind='bar', stacked=True, figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Number of incidents')
plt.title('Larceny/Theft incidents with resolution of NONE by PdDistrict and year')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/3597877542.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['Date'] = pd.to_datetime(larceny_theft_none_df['Date'], format='%m/%d/%Y')
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/3597877542.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['Year'] = larceny_theft_none_df['Date'].dt.year
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/3597877542.py:11: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['PdDistrict'] = larceny_theft_none_df['PdDistrict'].fillna('Unknown')
# Filter the dataframe to only include incidents in the 'Larceny/Theft' category with resolution of 'NONE'
larceny_theft_none_df = df_crimes[(df_crimes['Category'] == 'LARCENY/THEFT') & (df_crimes['Resolution'] == 'NONE')]
# Convert the 'Date' column to a datetime data type
larceny_theft_none_df['Date'] = pd.to_datetime(larceny_theft_none_df['Date'], format='%m/%d/%Y')
# Create a new column called 'Year' by extracting the year from the 'Date' column
larceny_theft_none_df['Year'] = larceny_theft_none_df['Date'].dt.year
# Create a new column called 'PdDistrict'
larceny_theft_none_df['PdDistrict'] = larceny_theft_none_df['PdDistrict'].fillna('Unknown')
# Group the data by year and PdDistrict, and count the number of incidents in each group
yearly_pd_counts = larceny_theft_none_df.groupby(['Year', 'PdDistrict']).size().reset_index(name='TotalIncidents')
# Pivot the data to create a matrix with years as rows, PdDistricts as columns, and incident counts as values
yearly_pd_pivot = yearly_pd_counts.pivot(index='Year', columns='PdDistrict', values='TotalIncidents').fillna(0)
# Convert the counts to percentages
yearly_pd_pivot_percent = yearly_pd_pivot.apply(lambda x: x/x.sum(), axis=1)*100
# Plot a percentage stacked bar chart with a PdDistrict breakdown
yearly_pd_pivot_percent.plot(kind='bar', stacked=True, figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Percentage of incidents')
plt.title('Larceny/Theft incidents with resolution of NONE by PdDistrict and year')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/3700572689.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['Date'] = pd.to_datetime(larceny_theft_none_df['Date'], format='%m/%d/%Y')
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/3700572689.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['Year'] = larceny_theft_none_df['Date'].dt.year
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/3700572689.py:11: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['PdDistrict'] = larceny_theft_none_df['PdDistrict'].fillna('Unknown')
# Filter the dataframe to only include incidents in the 'Larceny/Theft' category with resolution of 'NONE'
larceny_theft_arrest_df = df_crimes[(df_crimes['Category'] == 'LARCENY/THEFT') & (df_crimes['Resolution'] != 'NONE') & (df_crimes['Descript'].str.contains('ARREST'))]
# Convert the 'Date' column to a datetime data type
larceny_theft_none_df['Date'] = pd.to_datetime(larceny_theft_none_df['Date'], format='%m/%d/%Y')
# Create a new column called 'Year' by extracting the year from the 'Date' column
larceny_theft_none_df['Year'] = larceny_theft_none_df['Date'].dt.year
# Create a new column called 'PdDistrict'
larceny_theft_none_df['PdDistrict'] = larceny_theft_none_df['PdDistrict'].fillna('Unknown')
# Group the data by year and PdDistrict, and count the number of incidents in each group
yearly_pd_counts = larceny_theft_none_df.groupby(['Year', 'PdDistrict']).size().reset_index(name='TotalIncidents')
# Pivot the data to create a matrix with years as rows, PdDistricts as columns, and incident counts as values
yearly_pd_pivot = yearly_pd_counts.pivot(index='Year', columns='PdDistrict', values='TotalIncidents').fillna(0)
# Create a percentage version of the pivot table
yearly_pd_pct = yearly_pd_pivot.apply(lambda x: x/x.sum()*100, axis=1)
# Plot a yearly stacked bar chart with a PdDistrict breakdown
yearly_pd_pct.plot(kind='bar', stacked=True, figsize=(12,6))
plt.xlabel('Year')
plt.ylabel('Percentage of incidents')
plt.title('Percentage of Larceny/Theft incidents that involved an arrest by PdDistrict and year')
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/1870507638.py:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['Date'] = pd.to_datetime(larceny_theft_none_df['Date'], format='%m/%d/%Y')
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/1870507638.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['Year'] = larceny_theft_none_df['Date'].dt.year
/var/folders/44/wvtg39xd19vdrx40g1phtyq80000gn/T/ipykernel_21078/1870507638.py:11: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
larceny_theft_none_df['PdDistrict'] = larceny_theft_none_df['PdDistrict'].fillna('Unknown')
In this part we will analyz crime data from May 2017 to May 2018, specifically focusing on larceny and theft incidents.
First, we imported the necessary libraries, including Folium and Pandas. Folium is a Python library that allows us to create interactive maps, and Pandas is a data manipulation library.
# Import the necessary libraries
import folium
import pandas as pd
from folium.plugins import HeatMapWithTime
Next, we importe the crime dataset into the Jupyter Notebook using Pandas. We also parse the date and time columns to create a single datetime column for easier data analysis.
# Import the dataset
df = pd.read_csv('data/Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv', parse_dates=[['Date', 'Time']])
# Filter the dataset to only include larceny and theft incidents between May 2017 and May 2018.
df_larceny_theft = df.loc[(df['Date_Time'] >= "2017-05-31") &
(df['Date_Time'] < "2018-05-31") &
(df['Category'] == "LARCENY/THEFT")].reset_index(drop=True)
# Create a list of latitude and longitude pairs for each incident
x_y = [(lat, lon) for lat, lon in zip(df_larceny_theft["Y"], df_larceny_theft["X"])]
We create a map of San Francisco using Folium. We set the map's initial coordinates and zoom level and chose the "Stamen Toner" tileset for the map's visual style. To plot the crime incidents on the map, we loop through the dataset and create a circle marker for each crime incident. We set the marker's color to red and radius to 1 to distinguish them from each other. We add the markers to the map, creating an interactive visualization of the crime incidents in San Francisco.
# Plot the locations on a map
SF_map = folium.Map(location=[37.77919, -122.41914], zoom_start=12, tiles="Stamen Toner")
folium.Marker(location=[37.77919, -122.41914], popup='SF City Hall').add_to(SF_map)
for location in x_y:
folium.CircleMarker(location=location, radius=1, color='red').add_to(SF_map)
SF_map